19 research outputs found

    Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Large-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns. The interpretation of these gene lists is generally achieved by extracting concepts overrepresented in the gene lists. This analysis often depends on manual annotation of genes based on controlled vocabularies, in particular, Gene Ontology (GO). However, the annotation of genes is a labor-intensive process; and the vocabularies are generally incomplete, leaving some important biological domains inadequately covered.</p> <p>Results</p> <p>We propose a statistical method that uses the primary literature, i.e. free-text, as the source to perform overrepresentation analysis. The method is based on a statistical framework of mixture model and addresses the methodological flaws in several existing programs. We implemented this method within a literature mining system, BeeSpace, taking advantage of its analysis environment and added features that facilitate the interactive analysis of gene sets. Through experimentation with several datasets, we showed that our program can effectively summarize the important conceptual themes of large gene sets, even when traditional GO-based analysis does not yield informative results.</p> <p>Conclusions</p> <p>We conclude that the current work will provide biologists with a tool that effectively complements the existing ones for overrepresentation analysis from genomic experiments. Our program, Genelist Analyzer, is freely available at: <url>http://workerbee.igb.uiuc.edu:8080/BeeSpace/Search.jsp</url></p

    The META tool optimizes metagenomic analyses across sequencing platforms and classifiers

    Get PDF
    A major challenge in the field of metagenomics is the selection of the correct combination of sequencing platform and downstream metagenomic analysis algorithm, or ā€œclassifierā€. Here, we present the Metagenomic Evaluation Tool Analyzer (META), which produces simulated data and facilitates platform and algorithm selection for any given metagenomic use case. META-generated in silico read data are modular, scalable, and reflect user-defined community profiles, while the downstream analysis is done using a variety of metagenomic classifiers. Reported results include information on resource utilization, time-to-answer, and performance. Real-world data can also be analyzed using selected classifiers and results benchmarked against simulations. To test the utility of the META software, simulated data was compared to real-world viral and bacterial metagenomic samples run on four different sequencers and analyzed using 12 metagenomic classifiers. Lastly, we introduce ā€œMETA Scoreā€: a unified, quantitative value which rates an analytic classifierā€™s ability to both identify and count taxa in a representative sample

    BeeSpace Navigator: exploratory analysis of gene function using semantic indexing of biological literature

    Get PDF
    With the rapid decrease in cost of genome sequencing, the classification of gene function is becoming a primary problem. Such classification has been performed by human curators who read biological literature to extract evidence. BeeSpace Navigator is a prototype software for exploratory analysis of gene function using biological literature. The software supports an automatic analogue of the curator process to extract functions, with a simple interface intended for all biologists. Since extraction is done on selected collections that are semantically indexed into conceptual spaces, the curation can be task specific. Biological literature containing references to gene lists from expression experiments can be analyzed to extract concepts that are computational equivalents of a classification such as Gene Ontology, yielding discriminating concepts that differentiate gene mentions from other mentions. The functions of individual genes can be summarized from sentences in biological literature, to produce results resembling a model organism database entry that is automatically computed. Statistical frequency analysis based on literature phrase extraction generates offline semantic indexes to support these gene function services. The website with BeeSpace Navigator is free and open to all; there is no login requirement at www.beespace.illinois.edu for version 4. Materials from the 2010 BeeSpace Software Training Workshop are available at www.beespace.illinois.edu/bstwmaterials.php

    Sickness and health: Homophily in online health forums

    Get PDF
    This work explores the link between health and social relations by creating an automated metric of similarity of positive or negative affect (sentiment) between peers in online health forums. We analyze textual communication between peers and demonstrate that those who communicate often have similar average sentiment scores. Sentiment is the author???s immediate affective state, their positive or negative orientation. We hypothesize that average sentiment over time indicates overall happiness or sadness as similar analysis has been utilized to identify depression and depression at risk college students. These results follow the analysis of Framingham study data demonstrating that happy people tend to associate with one another and that happiness spreads within social networks

    Exploring machine learning techniques using patient interactions in online health forums to classify drug safety

    Get PDF
    This dissertation explores the use of personal health messages collected from online message forums to predict drug safety using natural language processing and machine learning techniques. Drug safety is defined as any drug with an active safety alert from the US Food and Drug Administration (FDA). It is believed that this is the first exploration of patient derived data of this type for pharmacovigilance ā€“ the study of drugs once released to market for safety. It is believed that this is the first application of machine learning and natural language processing techniques to be used for pharmicovigilance on patient derived data. We present results demonstrating the identification of drugs withdrawn from market as well as predictions of other potential safety alert drugs. One example includes Meridia, a weight loss drug linked with death for those with cardiovascular disease. The drug is identified based on data presented two years before FDA and European Union (EU) advisory panels were formed and the subsequent withdrawal of the drug from market within the EU and United States

    Visualizing Topics and Social Networks within a Health Newsgroup

    No full text
    The goal of this project was to develop a visualization that would simultaneously overlay topics extracted from a newsgroup with social networking information. The motivation of this work i
    corecore